Goto

Collaborating Authors

 subband signal


SIMD-size aware weight regularization for fast neural vocoding on CPU

arXiv.org Artificial Intelligence

This paper proposes weight regularization for a faster neural vocoder. Pruning time-consuming DNN modules is a promising way to realize a real-time vocoder on a CPU (e.g. WaveRNN, LPCNet). Regularization that encourages sparsity is also effective in avoiding the quality degradation created by pruning. However, the orders of weight matrices must be contiguous in SIMD size for fast vocoding. To ensure this order, we propose explicit SIMD size aware regularization. Our proposed method reshapes a weight matrix into a tensor so that the weights are aligned by group size in advance, and then computes the group Lasso-like regularization loss. Experiments on 70% sparse subband WaveRNN show that pruning in conventional Lasso and column-wise group Lasso degrades the synthetic speech's naturalness. The vocoder with proposed regularization 1) achieves comparable naturalness to that without pruning and 2) performs meaningfully faster than other conventional vocoders using regularization.


A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

arXiv.org Artificial Intelligence

This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into subbands in time domain. Inspired from the WaveNet, a convolutional neural network (CNN) model predicts subband speech signals fully in time domain. Due to the short bandwidth of the subbands, a simple network architecture is enough to train the simple patterns of the subbands accurately. In the ground truth experiments with teacher-forcing, the subband synthesizer outperforms the fullband model significantly in terms of both subjective and objective measures. In addition, by conditioning the model on the phoneme sequence using a pronunciation dictionary, we have achieved the fully time-domain neural model for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end. The generated speech of the subband TTS shows comparable quality as the fullband one with a slighter network architecture for each subband.


Wideband Time-Domain Digital Backpropagation via Subband Processing and Deep Learning

arXiv.org Machine Learning

We propose a low-complexity sub-banded DSP architecture for digital backpropagation where the walk-off effect is compensated using simple delay elements. For a simulated 96-Gbaud signal and 2500 km optical link, our method achieves a 2.8 dB SNR improvement over linear equalization. Indeed, the FIR filters can be as short as 3 taps per SSFM step, provided that the step size is sufficiently small (i.e., many steps are used) and the filters in all steps are jointly optimized 6 The complexity of time-domain DBP (TD-DBP) is dominated by the total number of CD filter taps in all steps. Recent work has focused on relatively narrowband signals (e.g., 10 Gbaud in 6 Since the memory increases quadratically with bandwidth, it is not clear if TD-DBP can be scaled gracefully also to more wideband signals. In this paper, we consider a 96-Gbaud signal where the delay spread per 100 km amounts to 125 symbol periods.


Source Separation with a Sensor Array using Graphical Models and Subband Filtering

Neural Information Processing Systems

Source separation is an important problem at the intersection of several fields, including machine learning, signal processing, and speech technology. Here we describe new separation algorithms which are based on probabilistic graphical models with latent variables. In contrast with existing methods, these algorithms exploit detailed models to describe source properties. They also use subband filtering ideas to model the reverberant environment, and employ an explicit model for background and sensor noise. We leverage variational techniques to keep the computational complexity per EM iteration linear in the number of frames.


Source Separation with a Sensor Array using Graphical Models and Subband Filtering

Neural Information Processing Systems

Source separation is an important problem at the intersection of several fields, including machine learning, signal processing, and speech technology. Here we describe new separation algorithms which are based on probabilistic graphical models with latent variables. In contrast with existing methods, these algorithms exploit detailed models to describe source properties. They also use subband filtering ideas to model the reverberant environment, and employ an explicit model for background and sensor noise. We leverage variational techniques to keep the computational complexity per EM iteration linear in the number of frames.


Source Separation with a Sensor Array using Graphical Models and Subband Filtering

Neural Information Processing Systems

Source separation is an important problem at the intersection of several fields, including machine learning, signal processing, and speech technology. Herewe describe new separation algorithms which are based on probabilistic graphical models with latent variables. In contrast with existing methods, these algorithms exploit detailed models to describe source properties. They also use subband filtering ideas to model the reverberant environment, and employ an explicit model for background and sensor noise. We leverage variational techniques to keep the computational complexityper EM iteration linear in the number of frames.